Skip to content

Conversation

@AdrianLundell
Copy link
Collaborator

  • Removes uses of torch_manual_seed which previously fixed the random state.
  • Adds pytest plugin pytest-rerunfailures to mark flaky tests.
  • Refactors flaky tests to use data generators in favor of pregenerated data, which ensures that data is randomized between reruns.
  • Updates layer_norm testcase to use same qtol value for TOSA/EthosU targets.

Note that fixing the randomness may lead to that we will see more flakyness in CI, this will have to be adressed with the flaky mark on a case by case basis over time.

- Removes uses of torch_manual_seed which previously fixed the random state.
- Adds pytest plugin pytest-rerunfailures to mark flaky tests.
- Refactors flaky tests to use data generators in favor of pregenerated data, which ensures that data is randomized between reruns.
- Updates layer_norm testcase to use same qtol value for TOSA/EthosU targets.

Note that fixing the randomness may lead to that we will see more flakyness in CI, this will have to be adressed with the flaky mark on a case by case basis over time.

Change-Id: I15aa8b517bec2a748b93b0d74e09e2f48df40926
@AdrianLundell AdrianLundell added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk topic: not user facing labels Jan 15, 2025
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 15, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7669

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job, 1 Pending

As of commit 738a696 with merge base 4796da7 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 15, 2025
@AdrianLundell
Copy link
Collaborator Author

Looking into unittest-arm/linux-job errors, others are unrelated:
ERROR: No matching distribution found for torchtune==0.4.0.dev20241112

@AdrianLundell
Copy link
Collaborator Author

I'll have to wait with this until CI is fixed, do not merge.

@zingo
Copy link
Collaborator

zingo commented Jan 20, 2025

Macos and phi fails seems unrealted.

@zingo
Copy link
Collaborator

zingo commented Jan 20, 2025

@digantdesai This touch a file outside arm folder is it OK to merge?
e.g. pyproject.toml

@AdrianLundell
Copy link
Collaborator Author

Now only unrelated CI errors left!

@zingo zingo requested a review from digantdesai January 23, 2025 13:43
@AdrianLundell
Copy link
Collaborator Author

@digantdesai What do you think, is it ok to add the new pytest-rerunfailures dependency?

@zingo zingo merged commit a373925 into pytorch:main Jan 31, 2025
104 of 106 checks passed
@AdrianLundell AdrianLundell deleted the change-969691 branch February 11, 2025 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm topic: not user facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants